Skip to content

SV plugin updates#70

Open
nakib103 wants to merge 2 commits intoEnsembl:mainfrom
nakib103:sv_plugin
Open

SV plugin updates#70
nakib103 wants to merge 2 commits intoEnsembl:mainfrom
nakib103:sv_plugin

Conversation

@nakib103
Copy link
Contributor

@nakib103 nakib103 commented Mar 4, 2026

No description provided.

Comment on lines +73 to +75
SV_PLUGINS = [
"CADD"
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is CADD really the only plugin we want to run for Structural variants? Should all others be disabled (as this code does)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only other candidate is Phenotype, but we will have separate data store for it soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nakib103 As the phenotype store is still in design, I'd say we should keep the current phenotype plugin enabled until we have access to the new data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phenotype plugin data (the GFF3s) has the phenotype for structural variants. But I don't think anyone ever tested/worked on the plugin to see if it works with that data (you might get lots of Pheno against short variants getting attached to the SVs).
But if you already have tested and have good hunch about it I can add the plugin.

Comment on lines 325 to 344
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
plugin_files = []
if structural_variant:
sv = os.path.join(plugin_data_dir, "CADD_prescored_variants.tsv.gz")
plugin_files = [sv]
else:
if species == "sus_scrofa":
snv = os.path.join(plugin_data_dir, "ALL_pCADD-PHRED-scores.tsv.gz")
plugin_files = [snv]
else:
snv = os.path.join(
plugin_data_dir, f"CADD_{assembly}_1.7_whole_genome_SNVs.tsv.gz"
)
indels = os.path.join(plugin_data_dir, f"CADD_{assembly}_1.7_InDels.tsv.gz")
plugin_files = [snv, indels]
if len(plugin_files) > 0:
check_plugin_files(plugin, plugin_files)
return f"CADD,{','.join(plugin_files)}"
else:
return ''

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested change here should span lines 325 - 344, for some reason it doesn't seem to get the formatting right (It's not just adding 21 lines but rather replacing the 19 existing ones with the 21 above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the suggestion with slight modification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why still repeating the check_plugin_files function-calling and the return value formatting logic? My suggestion was mostly about not repeating that logic (it only calls check_plugin_files and formats and returns the CLI arg on one line by making use of the plugin_files array).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah that might be right, it was hard reading with wrapped lines. I kept it this way for better readability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants